HIVE-27006: Fix ParallelEdgeFixer#4043
Conversation
5d4ff36 to
58f970d
Compare
|
Kudos, SonarCloud Quality Gate passed! |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
|
Kudos, SonarCloud Quality Gate passed! |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
| set hive.exec.max.dynamic.partitions.pernode=4000; | ||
| set hive.exec.max.dynamic.partitions=10000; | ||
| set hive.exec.parallel.thread.number=32; | ||
| set hive.exec.parallel=false; |
There was a problem hiding this comment.
why do you need to set all these irrelevant options?
There was a problem hiding this comment.
removed irrelevant configurations
| Map 14 <- Map 6 (BROADCAST_EDGE), Union 12 (CONTAINS) | ||
| Map 5 <- Map 6 (BROADCAST_EDGE), Union 2 (CONTAINS) | ||
| Map 6 <- Reducer 8 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE) | ||
| Map 6 <- Reducer 10 (BROADCAST_EDGE), Reducer 8 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE) |
There was a problem hiding this comment.
this seems odd to me: I wonder why the sudden need of Reducer 10 for Map 6 ?
- there are no changes in
Map 6 Map 6refers only toRS_47andRS_51; so its odd that it has 3Reducers
There was a problem hiding this comment.
Current ParallelEdgeFixer does not update RuntimeValueInformation(RVI) correctly. Because TezCompiler creates SemiJoin edges based on RVI, this issue leads to absence of some edges.
The edge between Map6 and Reducer10 is one of the disappeared edge. After SWO, Map6 has 2 incoming SemiJoin edges that come from the same reducer. So PEF inserts SEL-RS in order to prevent parallel edge, but it does not update RVI of the parent of the inserted SEL-RS. That's why previous plan does not contain an edge between Map6 and Reducer10.
I attached 3 operator graphs for the sake of your better understanding. All graphs are generated during TPCDS30TB-query2 test.
Before applying PEF:

| boolean notTraverseable = !traverseableEdgeTypes.contains(opEdge.getEdgeType()); | ||
| boolean notInvertible = (s instanceof ReduceSinkOperator) && | ||
| !ParallelEdgeFixer.colMappingInverseKeys((ReduceSinkOperator) s).isPresent(); | ||
|
|
||
| return notTraverseable || notInvertible; |
There was a problem hiding this comment.
I think its better to not change something which is not broken...
the previous version was eagerly avoiding to call PEF#colMIK in case the edge type was not matching - what if for some reason it starts throwing exceptions for irrelevant cases?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
Outdated
Show resolved
Hide resolved
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
Outdated
Show resolved
Hide resolved
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
Outdated
Show resolved
Hide resolved
|
Kudos, SonarCloud Quality Gate passed!
|
…an Haindrich, Denys Kuzmenko) Closes apache#4043











What changes were proposed in this pull request?
ParallelEdgeFixer refers to RowSchema when inverting columns and updates RuntimeValueInfo as well as SemiJoinBranchInfo.
Why are the changes needed?
Current ParallelEdgeFixer does not update RuntimeValueInfo while SemiJoinBranchInfo is updated.
Since TezCompiler refers to RuntimeValueInfo when adding SemiJoin edges into a Tez DAG, the inconsistency between RuntimeValueInfo and SemiJoinBranchInfo leads to the absence of SemiJoin edge in Tez runtime.
Another problem of ParallelEdgeFixer is incorrect result of colMappingInverseKeys().
In current implementation, colMappingInverseKeys() depends on Operator.getColumnExprMap(), but I found that this method sometimes returns an empty map although the Operator contains some columns. (Also the comment of this method says that it returns only key columns for RS and GBY operators.)
When this happens, ParallelEdgeFixer inserts a SEL operator without any column, and its child RS operator eventually fails due to Runtime error like below message.
Does this PR introduce any user-facing change?
No
How was this patch tested?
I tested the patch manually on cluster using the query described in JIRA and TPC-DS queries.